Simulation-Based Algorithms for Markov Decision Processes by Hyeong Soo Chang Jiaqiao Hu Michael C. Fu & Steven I. Marcus

Simulation-Based Algorithms for Markov Decision Processes by Hyeong Soo Chang Jiaqiao Hu Michael C. Fu & Steven I. Marcus

Author:Hyeong Soo Chang, Jiaqiao Hu, Michael C. Fu & Steven I. Marcus
Language: eng
Format: epub
Publisher: Springer London, London


where and denote the expectations taken with respect to f(⋅,θ k+1) and g k+1, respectively.

Proof

Define . Since f(⋅,θ) belongs to the NEF, we can write

Thus the gradient of with respect to θ can be expressed as

where the validity of the interchange of derivative and integral above is guaranteed by the dominated convergence theorem.

By Assumption A3 and the non-decreasing property of the sequence , it turns out that the above gradient is finite and thus well-defined. Moreover, since ρ>0, it can be seen from the MRAS0 algorithm that the set has a strictly positive Lebesgue/counting measure. It follows that we must have .

By setting , it immediately follows that



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.